Serial Analysis Of Gene Expression
   HOME

TheInfoList



OR:

Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular
biologist A biologist is a scientist who conducts research in biology. Biologists are interested in studying life on Earth, whether it is an individual Cell (biology), cell, a multicellular organism, or a Community (ecology), community of Biological inter ...
s to produce a snapshot of the
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the p ...
population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.


Overview

Briefly, SAGE experiments proceed as follows: # The
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
of an input sample (e.g. a
tumour A neoplasm () is a type of abnormal and excessive growth of tissue. The process that occurs to form or produce a neoplasm is called neoplasia. The growth of a neoplasm is uncoordinated with that of the normal surrounding tissue, and persists ...
) is isolated and a
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
and biotinylated primers are used to synthesize cDNA from
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
. # The cDNA is bound to Streptavidin beads via interaction with the biotin attached to the primers, and is then cleaved using a
restriction endonuclease A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
called an anchoring enzyme (AE). The location of the cleavage site and thus the length of the remaining cDNA bound to the bead will vary for each individual cDNA (mRNA). # The cleaved cDNA downstream from the cleavage site is then discarded, and the remaining immobile cDNA fragments upstream from cleavage sites are divided in half and exposed to one of two adaptor oligonucleotides (A or B) containing several components in the following order upstream from the attachment site: 1) Sticky ends with the AE cut site to allow for attachment to cleaved cDNA; 2) A recognition site for a restriction endonuclease known as the tagging enzyme (TE), which cuts about 15 nucleotides downstream of its recognition site (within the original cDNA/mRNA sequence); 3) A short primer sequence unique to either adaptor A or B, which will later be used for further amplification via PCR. # After adaptor
ligation Ligation may refer to: * Ligation (molecular biology), the covalent linking of two ends of DNA or RNA molecules * In medicine, the making of a ligature (tie) * Chemical ligation, the production of peptides from amino acids * Tubal ligation, a meth ...
, cDNA are cleaved using TE to remove them from the beads, leaving only a short "tag" of about 11 nucleotides of original cDNA (15 nucleotides minus the 4 corresponding to the AE recognition site). # The cleaved cDNA tags are then repaired with
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create ...
to produce blunt end cDNA fragments. # These cDNA tag fragments (with adaptor primers and AE and TE recognition sites attached) are ligated, sandwiching the two tag sequences together, and flanking adaptors A and B at either end. These new constructs, called ditags, are then PCR amplified using anchor A and B specific primers. # The ditags are then cleaved using the original AE, and allowed to link together with other ditags, which will be ligated to create a cDNA
concatemer A concatemer is a long continuous DNA molecule that contains multiple copies of the same DNA sequence linked in series. These polymeric molecules are usually copies of an entire genome linked end to end and separated by ''cos'' sites (a protein bi ...
with each ditag being separated by the AE recognition site. # These concatemers are then transformed into bacteria for amplification through bacterial replication. # The cDNA concatemers can then be isolated and sequenced using modern high-throughput
DNA sequencer A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is the ...
s, and these sequences can be analysed with computer programs which quantify the recurrence of individual tags.


Analysis

The output of SAGE is a list of short sequence tags and the number of times it is observed. Using
sequence database In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized (" digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. T ...
s a researcher can usually determine, with some confidence, from which original
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
(and therefore which
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
) the tag was extracted. Statistical methods can be applied to tag and count lists from different samples in order to determine which
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s are more highly expressed. For example, a normal tissue sample can be compared against a corresponding
tumor A neoplasm () is a type of abnormal and excessive growth of tissue. The process that occurs to form or produce a neoplasm is called neoplasia. The growth of a neoplasm is uncoordinated with that of the normal surrounding tissue, and persists ...
to determine which
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s tend to be more (or less) active.


History

In 1979 teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids. In 1982–1983, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers. and Putney et al. who sequenced 178 clones from a rabbit muscle cDNA library. In 1991 Adams and co-workers coined the term
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
(EST) and initiated more systematic sequencing of cDNAs as a project (starting with 600 brain cDNAs). The identification of ESTs proceeded rapidly, millions of ESTs now available in public databases (e.g.
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
). In 1995, the idea of reducing the tag length from 100 to 800 bp down to tag length of 10 to 22 bp helped reduce the cost of mRNA surveys. In this year, the original SAGE protocol was published by
Victor Velculescu Victor E. Velculescu (born August 16, 1970) is a Professor of Oncology and Co-Director of Cancer Biology at Johns Hopkins University School of Medicine. He is internationally known for his discoveries in genomics and cancer research. Early lif ...
at the Oncology Center of
Johns Hopkins University Johns Hopkins University (Johns Hopkins, Hopkins, or JHU) is a private university, private research university in Baltimore, Maryland. Founded in 1876, Johns Hopkins is the oldest research university in the United States and in the western hem ...
. Although SAGE was originally conceived for use in cancer studies, it has been successfully used to describe the
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
of other diseases and in a wide variety of organisms.


Comparison to DNA microarrays

The general goal of the technique is similar to the
DNA microarray A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
. However, SAGE sampling is based on sequencing mRNA output, not on hybridization of mRNA output to probes, so transcription levels are measured more quantitatively than by microarray. In addition, the
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
sequences do not need to be known ''a priori'', so genes or gene variants which are not known can be discovered. Microarray experiments are much cheaper to perform, so large-scale studies do not typically use SAGE. Quantifying
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
s is more exact in SAGE because it involves directly counting the number of transcripts whereas spot intensities in microarrays fall in non-discrete gradients and are prone to background noise.


Variant protocols


miRNA cloning

MicroRNAs MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miR ...
, or miRNAs for short, are small (~22nt) segments of RNA which have been found to play a crucial role in gene regulation. One of the most commonly used methods for cloning and identifying miRNAs within a cell or tissue was developed in the Bartel Lab and published in a paper by Lau ''et al.'' (2001). Since then, several variant protocols have arisen, but most have the same basic format. The procedure is quite similar to SAGE: The small RNA are isolated, then linkers are added to each, and the RNA is converted to cDNA by
RT-PCR Reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique combining reverse transcription of RNA into DNA (in this context called complementary DNA or cDNA) and amplification of specific DNA targets using polymerase ch ...
. Following this, the linkers, containing internal restriction sites, are digested with the appropriate restriction enzyme and the sticky ends are ligated together into concatamers. Following concatenation, the fragments are ligated into plasmids and are used to transform bacteria to generate many copies of the plasmid containing the inserts. Those may then be sequenced to identify the miRNA present, as well as analysing expression levels of a given miRNA by counting the number of times it is present, similar to SAGE.


LongSAGE and RL-SAGE

LongSAGE was a more robust version of the original SAGE developed in 2002 which had a higher throughput, using 20 μg of mRNA to generate a cDNA library of thousands of tags.Saha, S., et al. (2002). "Using the transcriptome to annotate the genome." Nat Biotechnol 20(5): 508-512. Robust LongSage (RL-SAGE) Further improved on the LongSAGE protocol with the ability to generate a library with an insert size of 50 ng
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
, much smaller than previous LongSAGE insert size of 2 μg mRNA and using a lower number of ditag polymerase chain reactions ( PCR) to obtain a complete cDNA library.


SuperSAGE

SuperSAGE is a derivative of SAGE that uses the type III-
endonuclease Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain. Some, such as deoxyribonuclease I, cut DNA relatively nonspecifically (without regard to sequence), while many, typically called restriction endonucleases ...
EcoP15I of phage P1, to cut 26 bp long sequence tags from each transcript's cDNA, expanding the tag-size by at least 6 bp as compared to the predecessor techniques SAGE and LongSAGE. The longer tag-size allows for a more precise allocation of the tag to the corresponding transcript, because each additional base increases the precision of the annotation considerably. Like in the original SAGE protocol, so-called ditags are formed, using blunt-ended tags. However, SuperSAGE avoids the bias observed during the less random LongSAGE 20 bp ditag-ligation. By direct sequencing with high-throughput sequencing techniques (
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
, i.e.
pyrosequencing Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequ ...
), hundred thousands or millions of tags can be analyzed simultaneously, producing very precise and quantitative gene expression profiles. Therefore, tag-based gene expression profiling also called "digital gene expression profiling" (DGE) can today provide most accurate transcription profiles that overcome the limitations of
microarrays A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon ...
.


3'end mRNA sequencing, massive analysis of cDNA ends

In the mid 2010s several techniques combined with Next Generation Sequencing were developed that employ the "tag" principle for "digital gene expression profiling" but without the use of the tagging enzyme. The "MACE" approach, (=Massive Analysis of cDNA Ends) generates tags somewhere in the last 1500 bps of a transcript. The technique does not depend on restriction enzymes anymore and thereby circumvents bias that is related to the absence or location of the restriction site within the cDNA. Instead, the cDNA is randomly fragmented and the 3'ends are sequenced from the 5' end of the cDNA molecule that carries the poly-A tail. The sequencing length of the tag can be freely chosen. Because of this, the tags can be assembled into contigs and the annotation of the tags can be drastically improved. Therefore, MACE is also use for the analyses of non-model organisms. In addition, the longer contigs can be screened for polymorphisms. As UTRs show a large number of polymorphisms between individuals, the MACE approach can be applied for allele determination, allele specific gene expression profiling and the search for molecular markers for breeding. In addition, the approach allows determining alternative polyadenylation of the transcripts. Because MACE does only require 3’ ends of transcripts, even partly degraded RNA can be analyzed with less degradation dependent bias. The MACE approach uses unique molecular identifiers to allow for identification of PCR bias.


See also

*
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
*
Transcriptomics Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. He ...
**
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
**
DNA microarray A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
s **
Expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
s


References


External links


SAGEnet



A review of the SAGE technique at the Science Creative Quarterly
{{DEFAULTSORT:Serial Analysis Of Gene Expression Molecular biology